Module 1.1 - Learning With Derivatives¶

Training Data¶

  • Set of datapoints, each $(x,y)$
In [2]:
split_graph(s1, s2)
Out[2]:

Math¶

  • Linear Model

$$m(x; w, b) = x_1 \times w_1 + x_2 \times w_2 + b $$

In [3]:
def forward(self, x1: float, x2: float) -> float:
    return self.w1 * x1 + self.w2 * x2 + self.b

Model 1¶

  • Linear Model
In [4]:
@dataclass
class Linear:
    # Parameters
    w1: float
    w2: float
    b: float

    def forward(self, x1: float, x2: float) -> float:
        return self.w1 * x1 + self.w2 * x2 + self.b

Model 1¶

In [5]:
model = Linear(1, 1, -0.9)
draw_graph(model)
Out[5]:

Distance¶

  • $|m(x)|$ correct or incorrect
In [6]:
with_points(s1, s2, Linear(1, 1, -0.4))
Out[6]:

Log Sigmoid loss¶

In [7]:
def point_loss(x):
    return -math.log(minitorch.operators.sigmoid(-x))
In [8]:
graph(point_loss, [], [])
Out[8]:

Lecture Quiz¶

Outline¶

  • Model Fit
  • Derivatives
  • Module 1

Model Fitting¶

Start¶

In [9]:
hcat([show(Linear(1, 1, -1.0)),
      show(Linear(1, 1, -0.5))], 0.3)
Out[9]:

Goal¶

  • Find parameters that minimize loss
  • Finalize a fixed model

Fitting¶

  • Field of optimization
  • Many, many different approaches
  • Our focus: gradient descent

Parameter Fitting¶

  1. Compute the loss function, $L(w_1, w_2, b)$
  2. See how small changes would change the loss
  3. Update to parameters to locally reduce the loss

Example: Update Bias¶

In [10]:
model1 = Linear(1, 1, -0.4)
model2 = Linear(1, 1, -0.5)
In [11]:
compare(model1, model2)
Out[11]:
→

Step 1: Compute Loss¶

In [12]:
with_points(s1, s2, Linear(1, 1, -1.5))
Out[12]:
In [13]:
def point_loss(x):
    return -math.log(minitorch.operators.sigmoid(-x))

Full Loss¶

In [14]:
def full_loss(m):
    l = 0
    for x, y in zip(s.X, s.y):
        l += point_loss(-y * m.forward(*x))
    return -l
In [15]:
hcat([graph(point_loss, [], [-2, -0.2, 1]),
      graph(lambda x: point_loss(-x), [-1, 0.4, 1.3], [])], 0.3)
Out[15]:

Step 2: Find Direction¶

In [16]:
hcat([show(Linear(1, 1, -1.5)),
      show(Linear(1, 1, -1.45))], 0.3)
Out[16]:

Step 3: Update Parameters¶

In [17]:
set_svg_height(500)
show_loss(full_loss, Linear(1, 1, 0))
Out[17]:

Our Challenge¶

How do we find the right direction?

Symbolic Derivatives¶

Function Notation¶

$$f(x) = \sin(2 x)$$

In [18]:
plot_function("f(x) = sin(2x)", lambda x: math.sin(2 * x))

Symbolic Derivative¶

$$f(x) = \sin(2 x) \Rightarrow f'(x) = 2 \cos(2 x)$$

In [19]:
plot_function("f'(x) = 2*cos(2x)", lambda x: 2 * math.cos(2 * x))

Multiple Arguments¶

$$f(x, y) = \sin(x) + \cos(y)$$

In [20]:
plot_function3D("f(x, y) = sin(x) + 2 * cos(y)", lambda x,y: math.sin(x) + 2 * math.cos(y))

Derivatives with Multiple Arguments¶

$$f_x'(x, y) = \cos(x) \ \ \ f_y'(x, y) = -2 \sin(y)$$

In [21]:
plot_function3D("f'_x(x, y) = cos(x)", lambda x, y: math.cos(x))

Review: Derivative¶

$$f(x) = x^2 + 1$$

In [22]:
def f(x):
    return x * x + 1.0

plot_function("f(x)", f)

Review: Derivative¶

$$f'(x) = 2x$$

In [23]:
def d_f(x):
    return 2 * x

def tangent_line(slope, x, y):
    def line(x_):
        return slope * (x_ - x) + y

    return line

plot_function("f(x) vs f'(2)",
              f, fn2=tangent_line(d_f(2), 2, f(2)))

Review: Symbolic Derivatives¶

Expectation: Apply basic derivative rules.

  • Differentiation Rules

Numerical Derivatives¶

What if we don't have symbols?¶

$$f(x) = ...$$ $$f'(x) = ...$$

Derivative as higher-order function¶

$$f(x) = ...$$ $$f'(x) = ...$$

In [24]:
from typing import Callable
def derivative(f: Callable[[float], float]) -> Callable[[float], float]:
    ...

Definition of Derivative: Geometry¶

$$f'(x) = \lim_{\epsilon \rightarrow 0} \frac{f(x + \epsilon) - f(x)}{\epsilon}$$

Central Difference¶

Approximate derivatative

$$f'(x) \approx \frac{f(x + \epsilon) - f(x-\epsilon)}{2\epsilon}$$

Approximating Derivative¶

Key Idea: Only need to call $f$.

In [25]:
def central_difference(f: Callable[[float], float], x: float) -> float:
    ...

Mulitple Arguments¶

Turn 2-argument function into 1-arg.

In [26]:
def f(x, y):
    ...

def d_f_x(x, y):
    def inner(x):
        return f(x, y)
    return central_difference(inner, x)

Example¶

In [27]:
plot_function("sigmoid", minitorch.operators.sigmoid)

Example¶

In [28]:
def d_sigmoid(x):
    return minitorch.central_difference(minitorch.operators.sigmoid, x)

plot_function("Derivative of sigmoid", d_sigmoid)

Module-1¶

Module-1 Learning Objectives¶

  • Practical understanding of derivatives
  • Dive into autodifferentiation
  • Parameters and their usage

Module-1: What is it?¶

  • Numerical and symbolic derivatives
  • Implement our numerical class
  • Implement autodifferentiation
  • Everything is scalars for now (no "gradients")

Module-1 Overview¶

  • 5 Tasks
  • Module 1

Q&A¶